Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-18 of 18)
Companies can't remove reviews or game the system. Here's why
Chantel Moreno | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Hadoop is one of the most effective and efficient software which has been storing and processing an extremely colossal amount of data in my company for a long time now. The software Hadoop is primarily used for data collection of large amounts, storage as well as for analytics. From my experience, I have to say that Hadoop is extremely useful and has a reliable plus valid purpose.
  • The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
  • Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
  • Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
  • Customer support is quick.
  • As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
  • Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.
Apache Hadoop is majorly suited for companies that have large amounts of unstructured data flow like advertising and even web traffic so I feel that Hadoop is a great option when you have the extra bulk of data that is required to be stored and processed on a continuous basis. Moreover, I do recommend Hadoop but at the same time, I would also hope and suggest that the software of Hadoop gets supplemented with a faster and interactive database so that the overall querying service gets better.
Blake Baron | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
It's used organization-wide for older data that's not used as frequently. We use Teradata to warehouse our more recent data, but for data we don't access as often, it's migrated to Hadoop. It addresses the problem of securely storing data without paying the fortune that most warehouses charge for premium cloud storage.
  • Accessible
  • Inexpensive
  • User friendly
  • Much slower than more premium platforms
  • Doesn't connect with other data warehouses
  • Not mainstream -- somewhat more, "hacky" of a solution
Need cheap enterprise-level storage for data that is necessary to keep but isn't regularly accessed? Hadoop is the option for you. If you regularly have analysts or apps accessing the data warehouse, look for something more premium such as Teradata. The good news is that general SQL knowledge transfers well to this warehouse.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that it requires very sophisticated data scientists to use properly and in parallel, but the data scientists turn the data on its head, causing IT execution issues. This has forced IT to restructure data in a denormalized form so the business users can actually be productive. This is a big trend in organizations.
  • Great for inexpensive storage, when originally introduced.
  • Distributed processing
  • Industry standard
  • Network fabric needs to be more sophisticated.
  • Need centralized storage.
  • The three copy of data should have been in the original design, not years later.
  • Consider deploying Spectrum Scale in these environments.
Massive processing in a distributed environment with data that can be distributed. Research environments. Lab environments would also be a good use for Hadoop. Hadoop can also be used in support of Spark environments and used by Frameworks if deployed properly. The best scenario is with a Data Scientist that understands how to program appropriately.
Johanes Siregar | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.
  • Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
  • Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
  • Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.
  • User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
  • Multiple application versioning on a single cluster would be a nice to have feature.
  • Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.
Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hadoop has been an amazing development in the world of Big Data. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate datapoints.
  • Hadoop can take loads of data quickly and performs well under load.
  • Hadoop is customizable so that nearly any business objective can be justified with the right combination of data and reports.
  • Hadoop has a lot of great resources, both informal like the community and formal like the supported modules and training.
  • Hadoop is not a relational database, but it has the ability to add modules to run sql-like queries like Impala and Hive.
  • Hadoop is open source and has many modules. It can be difficult without context to know which modules to leverage.
Hadoop is well suited for organizations with a lot of data, trying to justify business decisions with data-driven KPIs and milestones. This tool is best utilized by engineers with data modeling experience and a high-level understanding of how the different data points can be used and correlated. It will be challenging for people with limited knowledge of the business and how data points are created.
Mark Gargiulo | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
We needed a robust/redundant system to run multiple simultaneous jobs for our ETL pipeline, this needed distributed storage space, integration with Windows AD user accounts and the ability to expand when needed with little to no downtime.
We are using Cloudera 5.6 to orchestrate the install (along with puppet) and manage the hadoop cluster.
  • The distributed replicated HDFS filesystem allows for fault tolerance and the ability to use low cost JBOD arrays for data storage.
  • Yarn with MapReduce2 gives us a job slot scheduler to fully utilize available compute resources while providing HA and resource management.
  • The hadoop ecosystem allows for the use of many different technologies all using the same compute resources so that your spark, samza, camus, pig and oozie jobs can happily co-exist on the same infrastructure.
  • Without Cloudera as a management interface the hadoop components are much harder to manage to ensure consistency across a cluster.
  • The calculations of hardware resources to job slots/resource management can be quite an exercise in finding that "sweet spot" with your applications, a more transparent way of figuring this out would be welcome.
  • A lot of the roles and management pieces are written in java, which from an administration perspective can have there own issues with garbage collection and memory management.
Hadoop is not for the faint of heart and is not a technology per se but an ecosystem of disparate technologies sitting on top of HDFS. It is certainly powerful but if, like me, you were handed this with no prior knowledge or experience using or administering this ecosystem the learning curve can be significant and ongoing having said that I don't think currently there are many other opensource technologies that can provide the flexibility in the "big data" arena especially for ETL or machine learning.
Muhammad Fazalul Rahman | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is not used as a norm in my organization. I just use it personally to complete my job faster. It is implemented in the research computing cluster to be used by faculty and students. It completes jobs faster by parallelizing the tasks using MapReduce framework. This gives me considerable speed in the tasks I perform.
  • Provides a reliable distributed storage to store and retrieve data. I am able to store data without having to worry that a node failing might cause the loss of data.
  • Parallelizes the task with MapReduce and helps complete the task faster. The ease of use of MapReduce makes it possible to write code in a simple way to make it run on different slaves in the cluster.
  • With the massive user base, it is not hard to find documentation or help relating to any problem in the area. Therefore, I rarely had any instances where I had to look for a solution for a really long time.
  • I would have hoped for a simpler interface if possible, so that the initial effort that had to be spent would have been much less. I often see others who are starting to use hadoop are finding it hard to learn.
  • I'm not sure if it is a problem with the organization and the modules they provide, but sometimes I wish there were more modules available to be used.
If the user is trying to complete a task quickly and efficiently, then Hadoop is the best option for them. However, it may happen that the deadline for the submission is close and the user has little or no knowledge of Hadoop. In this case, it is easier not to use hadoop since it takes time to learn.
Tom Thomas | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
The company I worked at used Hadoop clusters for processing huge datasets. They had several nodes for both production and per-production nodes. It allowed distributed processing of data across several clusters with an easy to use software model. It is used by the Systems and IT department at my company.
  • HDFS provides a very robust and fast data storage system.
  • Hadoop works well with generic "commodity" hardware negating the need for expensive enterprise grade hardware.
  • It is mostly unaffected by system and hardware failures of nodes and is self-sustained.
  • While its open source nature provides a lot of benefits, there are multiple stability issues that arise due to it.
  • Limited support for interactive analytics.
Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time.
February 23, 2016

Hadoop quick review

Score 9 out of 10
Vetted Review
Verified User
Incentivized
We have Hadoop pre-prod and prod clusters. Production clusters are comprised of 200 nodes. And we have realtime clusters as well. All the data will be moved to Hadoop. We use Hadoop to do machine learning and data warehousing.
  • Machine Learning Model, when SAS can not process 3 of years data. Hadoop is good tool to build the model.
  • Data warehousing is also another good use case. Using Teradata is expensive.
  • A lot of people are not from a programming background which makes Hue very important for end users when starting the Hadoop journey. Making Hue more user friendly and functional will be helpful for end users who don't much of a programming background.
Data is growing and grows fast. A relationship database can't hold this requirement any more. Real-time applications and distributed design are required for highly scalability and fault tolerance.
Tushar Kulkarni | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I have been working with Hadoop since last year. It is very user friendly. Hadoop was used by the data center management team. It allows distributed processing of huge amount of data sets across clusters of computers using simple programming models.
  • It is robust in the sense that any big data applications will continue to run even when individual servers fail.
  • Enormous data can be easily sorted.
  • It can be improved in terms of security.
  • Since it is open source, stability issues must be improved.
Hadoop is really very useful when dealing with big data.
Pierre LaFromboise | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We utilize Hadoop primarily as a large data staging area for disparate corporate data. Select data is aggregated and moved downstream to a more formal data warehouse. Some data analytics is also performed directly against the Hadoop stored data. The direct analytics is done primarily with Apache Spark utilizing Scala and Python.
  • No requirement for schema on write.
  • Ability to scale to massive amounts of data.
  • Open platform provides multiple options and customizations to fit your exact needs.
  • The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.
A big data problem doesn't always mean huge volumes of data. The other V's of big data (velocity and variety) are also important factors that may lead to selecting Hadoop as a platform.
Mrugen Deshmukh | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.
  • Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
  • Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
  • Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
  • Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.
Sudhakar Kamanboina | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is used by data center management team. Hadoop processes the metric data pushed by virtual machines. Hadoop's output is served to the analytics engine and respective actions are taken to maintain even load on machines.
  • Processing huge data sets.
  • Concurrent processing.
  • Performance increases with distribution of data across multiple machines.
  • Better handling of unstructured data.
  • Data nodes and processing nodes
  • Make Haadop lighweight.
  • Installation is very difficult. Make it more user friendly.
  • Introduce a feature that works with continuous integration.
Ask about how Hadoop fits in your environment and how fast it processes streaming data.
Gaurav Kasliwal | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I have been using Hadoop for 2 years and I really find it very useful, especially working with bigger datasets. I have used Hadoop and Mahout for my project to analyze and learn different patterns from Yelp Dataset. It was really very easy and user friendly to use.

  • Scalability. Hadoop is really useful when you are dealing with a bigger system and you want to make your system scalable.
  • Reliable. Very reliable.
  • Fast, Fast Fast!!! Hadoop really works very fast, even with bigger datasets.
  • Development tools are not that easy to use.
  • Learning curve can be reduced. As of now, some skill is a must to use Hadoop.
  • Security. In today's world, security is of prime importance. Hadoop could be made more secure to use.
Hadoop is really useful for larger datasets. It is not very useful when you are dealing with a smaller dataset.
November 11, 2015

Advantage Hadoopo

Ajay Jha | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
  • Processes big volume of data using parallelism in faster manner.
  • No schema required. Hadoop can process any type of data.
  • Hadoop is horizontally scalable.
  • Hadoop is free.
  • Development tools are not that friendly.
  • Hard to find hadoop resources.
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.
Bhushan Lakhe | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.
  • Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
  • Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
  • Hadoop can be used to store and process a very large amount of data very fast
  • Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
  • Hadoop is not easy to learn - there are various modules with little or no documentation
  • Hadoop being open-source, testing, quality control and version control are very difficult
Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing
Score 10 out of 10
Vetted Review
Verified User
My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.
  • Cost Effective
  • Distributed and Fault Tolerant
  • Easily Scalable
  • Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
  • More focus should be given to Hadoop Security
  • Single Master Node
  • More user adoption ( Even though it is increasing by each day )
Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .
Score 10 out of 10
Vetted Review
Verified User
Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
  • Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
  • Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
  • Adding capacity is seamless.
  • Logs that are easier to read.
Not an RDBMS - not well suited for traditional BI applications.
Return to navigation